provably adaptive reinforcement
Review for NeurIPS paper: Provably adaptive reinforcement learning in metric spaces
Summary and Contributions: This paper studies reinforcement learning (RL) problems on large state and action spaces that are endowed with a metric. They key assumption is that the optimal state-action value function, Q*, is Lipschitz smooth with respect to that metric. The setting is that of an episodic, H-stage Markov decision process, in which the learner must choose an action for each stage of each episode while achieving low regret against an optimal policy. Previously, an algorithm was proposed for this problem based on learning the Q* function on an adaptive discretization of the state-action space that becomes steadily finer on important regions of the space. The regret of this algorithm was thought to depend on the packing number of the state-action space.
Review for NeurIPS paper: Provably adaptive reinforcement learning in metric spaces
This paper is about model-free RL where the state-action state is a metric space. An improved analysis of an existing algorithm (with some modifications) is shown to achieve a regret that scales with the zooming dimension of the metric space, instead of the covering dimesion. A general consensus among reviewers emerged that this theoretical RL paper is well executed, and provides a reasonable though not groundbreaking contribution to the RL literature.
Provably adaptive reinforcement learning in metric spaces
We study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the zooming dimension of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.
Provably adaptive reinforcement learning in metric spaces
Cao, Tongyi, Krishnamurthy, Akshay
We study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.